Biostar

11,668 results • Page 1 of 234

Hello, I have a list of ~1,300 single bp sites and a fully annotated genome. I'd like to create a fasta file with only the 1,300 sites (with ±300 bp on each side). My sites are in an Excel file right now with chromosome, position

bedtools

updated 2 hours ago • Anita

Hello everyone, I have a genome fasta file which has 16,941 sequences. Here are example of my "genome.fasta": ``` >scf7180000026027 GAATGCATACTGCATCGATA &gt...gt;scf7180000026030 TGCCCAAGTTGTGAAGTGTC ``` I want to find identical sequences in this genome fasta file, and return their ids. My final purpose are find and remove any identical sequences present in my genome fasta file

fasta

updated 5 hours ago • Sony

I have a trinity assembly file in fasta format.I want to do annotation of conus genome.There is limited storage on my server pc. Is there any way to do annotation

annotation trinity transcriptome

updated 6 hours ago • Asim Bin Arshad

Particularly, I am searching for a schematic that ilustrate each step of both pipelines from fasta to vcf/maf. This blogpost https://gatk.broadinstitute.org/hc/en-us/articles/9022487952155-Structural-variant-SV-discovery

SV GATK Variant-Calling

updated 6 hours ago • Bioinformatics_begginner

and how to convert them into binary format. I have converted my files into tsv format. This is the header of my VCF file: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1_P-gDNA S1_P-cfDNA NORMAL cfDNA Could some one please help

r Upset

updated 6 hours ago • sainavyav22

pca") # Read eigenvec and eigenval files eigenvec <- read.table("eigenvec", header = FALSE) eigenval <- read.table("eigenval", header = FALSE) # Assign column names to eigenvec colnames(eigenvec) <- c("SampleID

SNPs PCA GBS LINUX r

updated 13 hours ago • Ali

infile) # Default delimiter is comma writer = csv.writer(outfile) # Write header to output file header = next(reader) writer.writerow(header + ['Decimal Latitude', 'Decimal Longitude']) # Convert each row and

Minutes Decimal offtopic Degrees python

updated 3 days ago • kuttibiotech2009

errors. 3. Use PretextView and its features for manual curation. 4. How to obtain a genome curated fasta file using the Rapid Curation pipeline. 5. Become familiar with additional tools used to curate more challenging genomes

Genome-Assembly Pretext-View Manual-Genome-Curation

updated 3 days ago • carlopecoraro2

Hello Everyone. I am working with the sra data for whole exome sequence analysis. I am facing a problem regarding the sam file that I created after alignment. I am adding all the steps. **fastq-dump --split-files SRR1178899.sra** **fastqc *.fq** **bwa mem -t 12 -Y -L 0 -M -R "@RG\tID:sample\tSM:sample\tPL:Illumina" /mnt/nas/reference_genome/BWA/mammals/hg38/genome.fa R1_step1.fq R2_step1.fq &a…

Sam Header problem file

updated 4 days ago • saifulislam99121

ln /usr/bin/cat bingo run cat test.txt bingo r cat test.txt # or run it d bingo cat test.txt ``` ### rename an executable file ```bash bingo mv <old_name> <new_name> ``` ### delete an executable file only file in `$HOMW/.bingo/bin` can be removed

bingo

updated 4 days ago • dwpeng

the lexical analyzer generator; the darwin and xopen defines are # workarounds for some macOS 12 header file issues; e.g.: # sff.c:1615:19: error: implicitly declaring library function 'strdup' with type 'char *(const char *) # see also

installation troubleshooting DOCK6

updated 5 days ago • Rodolfo Adrián

input_file /opt/vep/files/${inputVcf_file} \ --output_file /opt/vep/files/${output_file} \ --fasta /opt/vep/.vep/custom/references/Homo_sapiens_assembly38.fasta \ --allele_number \ --individual all \ --per_gene Why is this

annotation vcf vep zygosity deepvariant

updated 5 days ago • asalimih

read names and insert sizes" lists an array of numbers per read pair, that does not resemble the headers (#id,numericID,insert,status,mismatches). Shown below: **This is the --outinsert:** #id numericID insert status mismatches...sized paired fastq files as separate runs, and then a shorter paired fastq file where we changed the headers. **This is what prints to screen for the…

bbtools bbmerge bbmap

updated 5 days ago • chrisk

Hello, I run kallisto on my data and I am in the process of assigning gene names to my data. I tried to do this in 2 different ways but I get different results. The first way I tried is shown below using the t2g.py from https://github.com/pachterlab/kallisto-transcriptome-indices/releases: #Create the transcripts_to_genes file python t2g.py --use_version <homo_sapiens.grch38…

biomart RNAseq kallisto

updated 6 days ago • bioinfo

I have previously used the biomart webportal to dow nload fastas for the 3'utrs of a gene-stable ensemble id list. Typically I limit my output to "MANE Select" as I am trying to get just one

utr biomart

updated 6 days ago • RNAseqer

I am seeking help with Augustus gene prediction! I am performing a whole genome assembly of a plant species. I have completed the gene prediction using the Augustus pipeline. The output file is of format `.gff` . Now I want to perform the gene annotation by performing `BLAST` for which I need the coding sequences in a `.fasta.` file. This is the method that I've thought of approaching. …

augustus annotation assembly genome

updated 7 days ago • Vijith

files only exist with UCSC chromosome nomenclature, but not for Ensembl. I know there are ways to rename these files, but since they have so many non-standard contigs, I have the feeling that might get a little messy. So, my current...since my BAM files are also trimmed to the CDS of some genes that are all on the main chromosomes. Renaming these is straightforward. However, I don't know, if M…

Mutect2

updated 7 days ago • gernophil

contigs [M::process] read 296298 sequences (20000115 bp)... [main_samview] fail to read the header from "-". [W::hts_set_opt] Cannot change block size for this format samtools sort: failed to read header from "-" Your insights

Samtools bam

updated 7 days ago • Vahid

Hi, I am looking for a fasta file that contains mouse rRNA sequences, but I noticed that the links I searched on the internet point to some different

fasta mm10 rRNA

updated 7 days ago • octpus616

W::bcf_hrec_check] Invalid tag name: "1000gALT" [W::vcf_parse_info] INFO '.' is not defined in the header, assuming Type=String [W::bcf_hrec_check] Invalid tag name: "." Error encountered while parsing the input at 1:121387974...W::bcf_hrec_check] Invalid tag name: "1000gALT" [W::vcf_parse_info] INFO '.' is not defined in the header, assuming Type=String [W::bcf_hrec_check] Invalid tag name:…

sort bcftools GLNexus merge VCF

updated 8 days ago • Matteo Ungaro

Hello everyone, I'm new to Rstudio, and i'm a little bit stuck. I'm trying to run the code of cibersort for the deconvolution of RNAseq samples using the LM22 signature matrix provided. I did a previous differential analysis with DESeq2, and used the normalized matrix of my analysis to run the cibersort script. here is my code: `if (!require(CIBERSORT))devtools::install_github("Moonerss/CIBERSO…

studio Cibersort R

updated 9 days ago • Azra

longest ORF in that identified sequence? Idenfity all repeats in a sequence for all sequences in the FASTA, along with how many times each repeat occurs and which is the most frequent repeat.” The primary problem I think I have...is that I don’t know how to reference the sequences inside a FASTA file beyond what I have already, so my has_codon section of code isn’t working like I think it should…

Python ORF FASTA Biopython

updated 11 days ago • cput

broker_name,sample_title,nominal_sdev,first_created&format=tsv&download=true&limit=0 headers = {"User-Agent": generate_user_agent()} download = s.get(url, headers=headers, allow_redirects=True) with open((os.path.join

ena python

updated 11 days ago • Giulia

score-client view --object-id 28358cf3-fba0-51a3-8b93-104bd5d48b23 --reference-file /home/victor/ref-fasta/GRCh38_full_analysis_set_plus_decoy_hla.fa --output-dir /media/victor/c1d5c312-b546-4d5e-b24f-72dbe9e6f18f/javier_CPTAC...per_patien/test However, it only gives me the header and as a SAM file. Has anyone used the query option and obtained the correct results

icgc samtools cram

updated 11 days ago • Javier

the issue but still I get 1.3x greater than hisat. code below: ``` # Run HISAT2 ....... # extract header from bam and save to sam file ….. #extract uniquely concordant reads samtools view sample-sorted.bam | \ awk 'BEGIN{FS="\t";OFS...if ($NF=="NH:i:1" && $(NF-2)=="YT:Z:CP"){print $0}}' > \ sample-for-subread.sam # merge header and sam file above …. # sam to bam …. #s…

RNA-seq featureCounts HISAT

updated 12 days ago • Prawesh

paired-end reads for a single plant sample that I have assembled using megahit, resulting in a FASTA file of contigs. This will act as my "reference genome". File 2: FASTA file of contigs generated from de novo assembly of ddRADseq

SAMtools BWA alignment ddRAD

updated 12 days ago • Lemonhope

Hi, I wonder how the samtools consensus work without explicitly pointing out the reference genome. If I intend to add a reference genome to generate the consensus sequence, is it possible based on samtools? Thanks a lot. Reference: https://www.htslib.org/doc/samtools-consensus.html

fasta fa cram genome

updated 12 days ago • me

CPU sec, 40.282 real sec [E::sam_hrecs_update_hashes] Duplicate entry "scf7180000010076" in sam header samtools view: failed to add PG line to the header And this is command that I run for mapping: bwa mem -t 8 -M -R '@RG\tID:SAMPLE_PE...ERR3890922.sam I currently using samtools 1.19.2 and BWA 0.7.17 I don't understand why SAM header has "Duplicate entry" and what sh…

sort. SAMtools. BAM. SAM.

updated 13 days ago • Sony

Hi, I am trying to do some differential expression experiments on my bacteria strain and I am very new to the field. I aligned my (paired-end) reads with STAR to both a genome and plasmid (using 2 separate fasta files + 1 combined gff file, which was checked for identical annotation format). Afterwards I used featureCounts, but unfortunately...the field. I aligned my (paired-end) reads with ST…

STAR paired-end read

updated 13 days ago • heelpPlease

singularity exec vg.1.52.sif vg autoindex --workflow map --prefix AllRefGraph --ref-fasta Ref1.fasta Ref2.fasta Ref3.fasta Ref4.fasta Ref5.fasta Ref6.fasta Ref7.fasta Ref8.fasta Ref9.fasta Ref10.fasta

updated 14 days ago • sarumonsus

string_api_url = "https://version-11-5.string-db.org/api" output_format = "tsv-no-header" method = "interaction_partners" my_proteins = proteins['protein']) # Construct the request request_url = "/".join([string_api_url

STRING-DB protein STRING-DB-API

updated 15 days ago • brandon

WBP4/gene_expression", countFiles[i]) counts <- read.table(countFilePath, header = FALSE, col.names = c("gene", sampleNames[i])) countDataList[[i]] <- counts } # Merge all count data into a single data frame by gene

DESEQ2 logfoldchange

updated 16 days ago • adi.gershon1

Hi all, I am trying to visualize the result of gene set enrichment analysis. This is my plot and the code in R. Is there any way that I can change the code then the text (names of gene sets) to be sorted as the example plot? Also, I want the square lines around the plot. here is my code: data <- read.csv("GSEA_visualize.csv", header = TRUE, sep = ",") # Load required librar…

barplot RNA-seq GSEA enrichment

updated 16 days ago • Rob

org.Hs.eg.db) library(dplyr) library(edgeR) mat<-read.table("~/Downloads/BRCA_exp_matrix.tsv",header=TRUE,sep="\t",fill=TRUE) library(readr) clinical <- read.table("~/Downloads/clinical_info_TCGA-BRCA.tsv", sep = "\t", na.strings

DGEList R

updated 16 days ago • Natali

can't handle the BAM files due to memory issues). I considered using bcftools consensus to generate FASTA files from the VCFs, but HLA typing software requires reads (FASTQ or BAM files), and I haven't found a way to obtain those

formats vcf fastaq HLA_imputation HLA_typing

updated 17 days ago • Javier

f"/*.sra ./ ; done < directories.txt # convert all the files in frw and rev fasta formats: fastq-dump --split-files *.sra </sample

SRA DADA2 metabarcoding

updated 18 days ago • Begonia_pavonina

files of which the file named `file.fasta.masked` is of the same size as the original input fasta file, another file named `file.fasta.out` is of ~700mb, and a third file named `file.fasta.tbl` . I understand that `file.fasta.masked

sequence annotation repeatmasker illumina assembly

updated 18 days ago • Vijith

are from different plasmid families (Rep types). I have exported each individual plasmid as a unique fasta file. I was just wondering if there is a way to assess genetic relatedness (and visibly display) between these plasmid

bacteria plasmid wgs hybridassembly sequencing

updated 19 days ago • nicole.kavanagh

I generated a pssm file from psi-blast and then I am using POSSUM to generate a pse-pssm file to run a programme, ASPIRER, for identifying unconventionally secerted proteins. However I am running into issues with it the code I used to generate the pssm file is as follows: ``` psiblast \ -db nr \ -query /nesi/project/vuw03925/software/POSSUM_Standalone_Toolkit/input/test_lottia.fasta \ -nu…

fasta pse-pssm pssm POSSUM

updated 19 days ago • rianna.collins

I am current trying to create pssm from FASTA using ncbi-blast-2.2.30+-x64-linux. I will use [UniProt][1] and [UnirRef90][2] db for the purpose. What will be the minimum requirements

blast pssm

updated 20 days ago • Nafi

I have a large fasta file of new species, I want to find extract a particular protein sequence. I also know a protein sequence of a similar...I have a large fasta file of new species, I want to find extract a particular protein sequence. I also know a protein sequence of a similar species

fasta alignment blast

updated 20 days ago • anna

I am unable to get variance in feature count file, I think there is problem with the reference genome file and GTF file, I have downloaded it from ncbi, but I didn't get the satisfied results Please, anyone, suggest me the error I had?

Arabidopsis-thaliana gtf reference-genome

updated 20 days ago • Ravita

based analysis of transcriptomic data in Galaxy Australia. I have downloaded the reference genome FASTA file and GTF file for Arabidopsis thaliana from NCBI. I have successfully mapped the raw reads to the reference genome

featureCounts

updated 20 days ago • Ravita

Hello everyone! Is there a way to introduce specific SNPs on the fasta sequence of a gene? I am working on a Pharmacogenetics project and I want to simulate reads for specific haplotypes...Hello everyone! Is there a way to introduce specific SNPs on the fasta sequence of a gene? I am working on a Pharmacogenetics project and I want to simulate reads for specific haplotypes of...PGX genes. T…

simulation snps haplotypes pharmacogenetics

updated 20 days ago • Riccardo

1.0" encoding="UTF-8"?><!DOCTYPE Query> <query count="" datasetconfigversion="0.6" formatter="TSV" header="0" uniquerows="0" virtualschemaname="plants_mart"><dataset interface="default" name="athaliana_eg_gene"><attribute name...1.0" encoding="UTF-8"?> <!DOCTYPE Query> <query count="" datasetconfigversion="0.6" formatter="TSV" header="0" uniquerows="0" virtua…

biomart plant ensembl

updated 20 days ago • Dora

research paper. I have seen many pdbs have less residues in a chain of a protein than the full FASTA sequence. Most likely, the cause is that they were unmodeled due to its going missing during the crystallization phase...1ZM1][2]. Here in chain B, the last couple of residues were unmodeled. Should I use the shortened FASTA from pdb or should I use the full FASTA for my dataset? [1…

PDB FASTA

updated 21 days ago • Nafi

edit feature (annotation) data annotLookup <- read.table( annotfile, header = FALSE, sep = '\t', stringsAsFactors = FALSE, comment.char = "#", fill = TRUE) colnames(annotLookup) <- annotLookup[2,] colnames(annotLookup

Microarray Limma Agilent

updated 24 days ago • hagl

the aligned haplotig reads as well as the chromosome it mapped to, and where it aligns in the header of the fastq so I can do chromosome level analysis on the haplotype assembly. How can I create this fastq? The same haplotig

Assembly phased Haplotype Annotation

updated 24 days ago • turcoa1

I'm trying to rename my clusters in a `Seurat` object. my old cluster IDs are numers ```r Idents(seuObj) <- 'RNA_snn_res.0.1' levels(seuObj) [1] "0" "1...I'm trying to rename my clusters in a `Seurat` object. my old cluster IDs are numers ```r Idents(seuObj) <- 'RNA_snn_res.0.1' levels(seuObj) [1] "0" "1" "2

Seurat RenameIdents R

updated 25 days ago • Assa Yeroslaviz

Hi I have a list of rsid and i want to search against clinvar database and print the condition_germline column with respect to each rsid. Anyway, i have got a script. ``` use strict; use warnings; use LWP::Simple; use HTML::TableExtract; # Read list of rsids from file my $rsids_file = 'rsids.txt'; open(my $fh, '<', $rsids_file) or die "Can't open $rsids_file: $!"; my @rsids = <$…

python clinvar perl

updated 25 days ago • ashaneev07

11,668 results • Page 1 of 234

Recent Votes

Answer: Elbow plot question (scRNA seq data analysis - scanpy tutorial)

A: Bcftools merge taking too much time and producing large file

Answer: Kraken2 database

A: Why gene expression data should be log2 transformed?

Answer: Filter Genome for Specific Sites

Answer: How to find SRA sequences of some fungal whole genome sequences if only Biosampl

Recent Locations • All